Adversarial Network Bottleneck Features for Noise Robust Speaker Verification
نویسندگان
چکیده
In this paper, we propose a noise robust bottleneck feature representation which is generated by an adversarial network (AN). The AN includes two cascade connected networks, an encoding network (EN) and a discriminative network (DN). Melfrequency cepstral coefficients (MFCCs) of clean and noisy speech are used as input to the EN and the output of the EN is used as the noise robust feature. The EN and DN are trained in turn, namely, when training the DN, noise types are selected as the training labels and when training the EN, all labels are set as the same, i.e., the clean speech label, which aims to make the AN features invariant to noise and thus achieve noise robustness. We evaluate the performance of the proposed feature on a Gaussian Mixture Model-Universal Background Model based speaker verification system, and make comparison to MFCC features of speech enhanced by short-time spectral amplitude minimum mean square error (STSA-MMSE) and deep neural network-based speech enhancement (DNN-SE) methods. Experimental results on the RSR2015 database show that the proposed AN bottleneck feature (AN-BN) dramatically outperforms the STSA-MMSE and DNN-SE based MFCCs for different noise types and signal-to-noise ratios. Furthermore, the AN-BN feature is able to improve the speaker verification performance under the clean condition.
منابع مشابه
An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition
Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...
متن کاملFooling End-to-end Speaker Verification by Adversarial Examples
Automatic speaker verification systems are increasingly used as the primary means to authenticate costumers. Recently, it has been proposed to train speaker verification systems using end-to-end deep neural models. In this paper, we show that such systems are vulnerable to adversarial example attacks. Adversarial examples are generated by adding a peculiar noise to original speaker examples, in...
متن کاملConditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification
Improving speech system performance in noisy environments remains a challenging task, and speech enhancement (SE) is one of the effective techniques to solve the problem. Motivated by the promising results of generative adversarial networks (GANs) in a variety of image processing tasks, we explore the potential of conditional GANs (cGANs) for SE, and in particular, we make use of the image proc...
متن کاملCrafting Adversarial Examples For Speech Paralinguistics Applications
Computational paralinguistic analysis is increasingly being used in a wide range of applications, including securitysensitive applications such as speaker verification, deceptive speech detection, and medical diagnostics. While state-ofthe-art machine learning techniques, such as deep neural networks, can provide robust and accurate speech analysis, they are susceptible to adversarial attacks. ...
متن کاملAlternative Approaches to Neural Network Based Speaker Verification
Just like in other areas of automatic speech processing, feature extraction based on bottleneck neural networks was recently found very effective for the speaker verification task. However, better results are usually reported with more complex neural network architectures (e.g. stacked bottlenecks), which are difficult to reproduce. In this work, we experiment with the so called deep features, ...
متن کامل